Each instance of a Type with a TypeKind
of “Character” is what Unicode terms a “code unit”. Note that there can be multiple “code units” for a “code point” (each letter, punctuation mark, symbol etc in a system is typically associated with a “code point”).
A “code unit” is typically a fixed number of bytes. The Endianness is format dependent – for example, in MXF: UTF-16 “code units” are Big Endian (“UTF-16BE”).
The Definition
field must include the official IANA character set “Name” (see http://www.iana.org/assignments/character-sets/character-sets.xhtml). This can be interpreted as a statement of both:
- the allowed “code points” that can be represented by data instances of this Type
- the byte encoding of data instances of this Type – this will only be relevant to certain implementations (for example: it would be relevant for an MXF file because each data instance is KLV encoded as a sequence of bytes. However, it would not be relevant for a Reg-XML document because all data is encoded as text anyway – the method used to encode the entire Reg-XML document as bytes is described independently at the top of the Reg-XML document).
Note that it is essential to read and interpret the Definition
field to understand how each Type with a TypeKind
of “Character” is to be handled – the entry in the Types Register does not provide any way for this to be reliably signalled in a machine-readable way.
Note that the definitions of the “Character” TypeKind
given above, in AAF, and in Reg-XML, are all slightly different.